Automated zone correction in bitmapped document images

نویسندگان

  • Susan E. Hauser
  • Daniel X. Le
  • George R. Thoma
چکیده

The optical character recognition system (OCR) selected by the National Library of Medicine (NLM) as part of its system for automating the production of MEDLINE® records frequently segments the scanned page images into zones which are inappropriate for NLM's application. Software has been created in-house to correct the zones using character coordinate and character attribute information provided as part of the OCR output data. The software correctly delineates over 97% of the zones of interest tested to date.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Semi-Automated Algorithm for Segmentation of the Left Atrial Appendage Landing Zone: Application in Left Atrial Appendage Occlusion Procedures

Background: Mechanical occlusion of the Left atrial appendage (LAA) using a purpose-built device has emerged as an effective prophylactic treatment in patients with atrial fibrillation at risk of stroke and a contraindication for anticoagulation. A crucial step in procedural planning is the choice of the device size. This is currently based on the manual analysis of the “Device Landing Zone” fr...

متن کامل

On-the-fly Hyperlink Creation for Page Images

Hypertext is an appealing interface for digital libraries, but using existing paper documents to build such a library poses several challenges. We describe a system for creating hypertext links on the fly in a library composed of bitmapped images of paper documents and text derived from those images by optical-character recognition. We present two simple ideas: text-image maps coordinate text a...

متن کامل

Automated Document Labeling Using Integrated Image and Neural Processing

As part of our effort to develop an automated data entry system to identify and convert bibliographic information from paper-based documents to electronic format for inclusion in the MEDLINE database used worldwide by biomedical researchers and clinicians, we have implemented a new technique for automatically labeling zones from scanned images with meaningful labels such as title, author, affi...

متن کامل

The Segmentation and Identification of Handwriting in Noisy Document Images

In this paper we present an approach to the problem of segmenting and identifying handwritten annotations on noisy document images. In many types of documents such as correspondence, it is not uncommon for handwritten annotations to be added as part of a note, correction, clarification, or instruction, or for initials or a signature to appear as an authentication mark. It is important to be abl...

متن کامل

Automated Segmentation of Math-Zones from Document Images

With an aim to high-level understanding of the mathematical contents in a document image the requirement of math-zone extraction and recognition technique is obvious. In this paper we present fully auotmatic segmentation of displayed-math zones from the document image, using only the spatial layout information of math-formulas and equations, so as to help commercial OCR systems which cannot dis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000